NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Data Discovery and Indexing for Semi-Structured Scientific Data [Data Discovery and Indexing for Semi-Structured Scientific Data]

https://doi.org/10.5220/0012706000003690

Jagini, Kaushik; Zhang, Yifan; Guo, Yichen; Goddy, Julian; Stansberry, Dale; Agar, Joshua; Heflin, Jeff (April 2024, SCITEPRESS - Science and Technology Publications)

Full Text Available
An Evaluation of Strategies to Train More Efficient Backward-Chaining Reasoners

https://doi.org/10.1145/3587259.3627564

Jia, Yue-Bo; Johnson, Gavin; Arnold, Alex; Heflin, Jeff (December 2023, ACM)

Knowledge bases traditionally require manual optimization to en- sure reasonable performance when answering queries. We build on previous work on training a deep learning model to learn heuristics for answering queries by comparing different representations of the sentences contained in knowledge bases. We decompose the problem into issues of representation, training, and control and propose solutions for each subproblem. We evaluate different con- figurations on three synthetic knowledge bases. In particular we compare a novel representation approach based on learning to max- imize similarity of logical atoms that unify and minimize similarity of atoms that do not unify, to two vectorization strategies taken from the automated theorem proving literature: a chain-based and a 3-term-walk strategy. We also evaluate the efficacy of pruning the search by ignoring rules with scores below a threshold.
more » « less
Full Text Available
Learning a More Efficient Backward-Chaining Reasoner

Arnold, Alex; Heflin, Jeff (November 2022, The Tenth Annual Conference on Advances in Cognitive Systems (ACS-2022))

Full Text Available
Truth in a sea of data: adoption and use of data search tools among researchers and journalists

https://doi.org/10.1080/1369118X.2022.2147398

Jia, Haiyan; Miller, Larrisa I.; Hicks, Jessica; Moscot, Ethan; Landberg, Alissa; Heflin, Jeff; Davison, Brian D. (November 2022, Information, Communication & Society)

The increasing availability of data search tools brings opportunities for non-expert users. Among these users, interdisciplinary researchers and data journalists represent a growing population whose work can lead to societal benefit. Through in-depth interviews, we examine what strategies and approaches researchers and journalists adopt to search online data, how they apply current technology to facilitate dataset search, and the barriers and difficulties that they encounter in their work with data. Our findings reveal that with technological limitations in the aspects of searchability, interactivity and usability, dataset search for non-experts remains a challenge. We have found that little attention has been paid to non-experts’ emerging data need, significantly constraining the design and development of technological tools for supporting non-expert users. Our findings underline the critical impact of the design, development and deployment of technological tools to enable the meaningful use of today’s increasingly available data toward a civil society.
more » « less
Full Text Available
StruBERT: Structure-aware BERT for Table Search and Matching

https://doi.org/10.1145/3485447.3511972

Trabelsi, Mohamed; Chen, Zhiyu; Zhang, Shuo; Davison, Brian D.; Heflin, Jeff (April 2022, Proceedings of the ACM Web Conference 2022)

A table is composed of data values that are organized in %a 2D matrix with rows and columns providing implicit structural information. A table is usually accompanied by secondary information such as the caption, page title, etc., that form the textual information. Understanding the connection between the textual and structural information is an important, yet neglected aspect in table retrieval, as previous methods treat each source of information independently. In this paper, we propose StruBERT, a structure-aware BERT model that fuses the textual and structural information of a data table to produce context-aware representations for both textual and tabular content of a data table. We introduce the concept of horizontal self-attention, which extends the idea of vertical self-attention introduced in TaBERT and allows us to treat both dimensions of a table equally. StruBERT features are integrated in a new end-to-end neural ranking model to solve three table-related downstream tasks: keyword- and content-based table retrieval, and table similarity. We evaluate our approach using three datasets, and we demonstrate substantial improvements in terms of retrieval and classification metrics over state-of-the-art methods.
more » « less
Full Text Available
Neural ranking models for document retrieval

https://doi.org/10.1007/s10791-021-09398-0

Trabelsi, Mohamed; Chen, Zhiyu; Davison, Brian D.; Heflin, Jeff (December 2021, Information Retrieval Journal)

Abstract Ranking models are the main components of information retrieval systems. Several approaches to ranking are based on traditional machine learning algorithms using a set of hand-crafted features. Recently, researchers have leveraged deep learning models in information retrieval. These models are trained end-to-end to extract features from the raw data for ranking tasks, so that they overcome the limitations of hand-crafted features. A variety of deep learning models have been proposed, and each model presents a set of neural network components to extract features that are used for ranking. In this paper, we compare the proposed models in the literature along different dimensions in order to understand the major contributions and limitations of each model. In our discussion of the literature, we analyze the promising neural components, and propose future research directions. We also show the analogy between document retrieval and other retrieval tasks where the items to be ranked are structured documents, answers, images and videos.
more » « less
Full Text Available
Exploring Datasets via Cell-Centric Indexing

Heflin, Jeff; Davison, Brian D.; Jia, Haiyan (September 2021, Proceedings of DESIRES 2021: Design of Experimental Search and Information REtrieval Systems, CEUR Workshop Proceedings)
Alonso, Omar; Marchesin, Stefano; Najork, Mark; Silvello, Gianmaria (Ed.)
We present a novel approach to dataset search and exploration. Cell-centric indexing is a unique indexing strategy that enables a powerful, new interface. The strategy treats individual cells of a table as the indexed unit, and combining this with a number of structure-specific fields enables queries that cannot be answered by a traditional indexing approach. Our interface provides users with an overview of a dataset repository, and allows them to efficiently use various facets to explore the collection and identify datasets that match their interests.
more » « less
Full Text Available
MGNETS: Multi-Graph Neural Networks for Table Search

https://doi.org/10.1145/3459637.3482140

Chen, Zhiyu; Trabelsi, Mohamed; Heflin, Jeff; Yin, Dawei; Davison, Brian D. (October 2021, Proceedings of the 30th ACM International Conference on Information and Knowledge Management (CIKM))

Table search aims to retrieve a list of tables given a user's query. Previous methods only consider the textual information of tables and the structural information is rarely used. In this paper, we propose to model the complex relations in the table corpus as one or more graphs and then utilize graph neural networks to learn representations of queries and tables. We show that the text-based table retrieval methods can be further improved by graph-based predictions which fuse multiple field-level information.
more » « less
Full Text Available
Towards Knowledge Acquisition of Metadata on AI Progress

Chen, Zhiyul Trabelsi; Davison, Brian D; Heflin, Jeff (October 2020, CEUR workshop proceedings)
Taylor, Kerry; Gonçalves, Rafael; Lecue, Freddy; Yan, Jun (Ed.)
We propose an ontology to help AI researchers keep track of the scholarly progress of AI related tasks such as natural language processing and computer vision. We first define the core entities and relations in the proposed Machine Learning Progress Ontology (MLPO). Then we describe how to use the techniques in natural language processing to construct a Machine Learning Progress Knowledge Base (MPKB) that can support various downstream tasks.
more » « less
Full Text Available
An Exploratory Interface for Dataset Repositories Using Cell-Centric Indexing

https://doi.org/10.1109/BigData50022.2020.9378057

Johnson, Drake; Register, Keith; Davison, Brian D.; Heflin, Jeff (December 2020, IEEE International Conference on Big Data (IEEE BigData 2020))
null (Ed.)
Large collections of datasets are being published on the Web at an increasing rate. This poses a problem to researchers and data journalists who must sift through these large quantities of data to find datasets that meet their needs. Our solution to this problem is cell-centric indexing, a novel approach which considers the individual cell of a dataset to be the fundamental unit of search, indexing the corresponding metadata to each individual cell. This facilitates a new style of user interface that allows users to explore the collection via histograms that show the distributions of various terms organized by how they are used in the dataset.
more » « less
Full Text Available

« Prev Next »

Search for: All records